Clustering US Senators¶

The goal is to cluster Senators to find those how follow the mainstream of their parties, and those who doesn't.

The dataset represent the Senate votes on proposed legislation from the 114th Senate. Each row represent a Senator and each column represent a vote. A 0 in a cell means the Senator voted No on the bill, 1 means the Senator voted Yes, and 0.5 means the Senator abstained.

In [1]:

import pandas as pd
votes = pd.read_csv("114_congress.csv")
votes.head()

Out[1]:

	name	party	state	00001	00004	00005	00006	00007	00008	00009	00010	00020	00026	00032	00038	00039	00044	00047
0	Alexander	R	TN	0.0	1.0	1.0	1.0	1.0	0.0	0.0	1.0	1.0	1.0	0.0	0.0	0.0	0.0	0.0
1	Ayotte	R	NH	0.0	1.0	1.0	1.0	1.0	0.0	0.0	1.0	0.0	1.0	0.0	1.0	0.0	1.0	0.0
2	Baldwin	D	WI	1.0	0.0	0.0	1.0	0.0	1.0	0.0	1.0	0.0	0.0	1.0	1.0	0.0	1.0	1.0
3	Barrasso	R	WY	0.0	1.0	1.0	1.0	1.0	0.0	1.0	1.0	1.0	1.0	0.0	0.0	1.0	0.0	0.0
4	Bennet	D	CO	0.0	0.0	0.0	1.0	0.0	1.0	0.0	1.0	0.0	0.0	0.0	1.0	0.0	1.0	0.0

Number of Senators by Party, Mean by roll¶

In [2]:

print(votes['party'].value_counts())
print(votes.mean())

R    54
D    44
I     2
Name: party, dtype: int64
00001    0.325
00004    0.575
00005    0.535
00006    0.945
00007    0.545
00008    0.415
00009    0.545
00010    0.985
00020    0.525
00026    0.545
00032    0.410
00038    0.480
00039    0.510
00044    0.460
00047    0.370
dtype: float64

Clustering¶

In [4]:

import pandas as pd
from sklearn.cluster import KMeans

kmeans_model = KMeans(n_clusters=2, random_state=1)
senator_distances = kmeans_model.fit_transform(votes.iloc[:,3:])
#distance from each cluster

Count how many Senators from each party ended up in each cluster.¶

In [5]:

labels = kmeans_model.labels_
print(labels)
pd.crosstab(labels , votes["party"])

[1 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 0 1 1 0 1
 0 1 1 1 0 1 1 0 1 1 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 1 1 0 1 0
 1 1 1 1 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 0]

Out[5]:

party	D	I	R
row_0
0	41	2	0
1	3	0	54

The first cluster contains 41 Democrats, and both Independents. The second cluster contains 3 Democrats, and 54 Republicans.

It sounds like 3 Democrats are more similar to Republicans in their voting than their own party. We'll explore these 3 Senators in more depth.

In [6]:

democratic_outliers = votes[(labels == 1) & (votes["party"] == "D")]
print(democratic_outliers)

        name party state  00001  00004  00005  00006  00007  00008  00009  \
42  Heitkamp     D    ND    0.0    1.0    0.0    1.0    0.0    0.0    1.0   
56   Manchin     D    WV    0.0    1.0    0.0    1.0    0.0    0.0    1.0   
74      Reid     D    NV    0.5    0.5    0.5    0.5    0.5    0.5    0.5   

    00010  00020  00026  00032  00038  00039  00044  00047  
42    1.0    0.0    0.0    0.0    1.0    0.0    0.0    0.0  
56    1.0    1.0    0.0    0.0    1.0    1.0    0.0    0.0  
74    0.5    0.5    0.5    0.5    0.5    0.5    0.5    0.5

In [7]:

%matplotlib inline
import matplotlib.pyplot as plt

plt.scatter(senator_distances[:,0], senator_distances[:,1], c = labels)
plt.show()

Distance from the cluster center

Let's find extrems Senators¶

We cube every distance between a Senator point from the cluster center for identifying extrems

In [8]:

extremism = (senator_distances ** 3).sum(axis=1)
votes['extremism'] = extremism
votes.sort_values("extremism", inplace=True, ascending=False)
print(votes.head(10))

         name party state  00001  00004  00005  00006  00007  00008  00009  \
98     Wicker     R    MS    0.0    1.0    1.0    1.0    1.0    0.0    1.0   
53   Lankford     R    OK    0.0    1.0    1.0    0.0    1.0    0.0    1.0   
69       Paul     R    KY    0.0    1.0    1.0    0.0    1.0    0.0    1.0   
80      Sasse     R    NE    0.0    1.0    1.0    0.0    1.0    0.0    1.0   
26       Cruz     R    TX    0.0    1.0    1.0    0.0    1.0    0.0    1.0   
48    Johnson     R    WI    0.0    1.0    1.0    1.0    1.0    0.0    1.0   
47    Isakson     R    GA    0.0    1.0    1.0    1.0    1.0    0.0    1.0   
65  Murkowski     R    AK    0.0    1.0    1.0    1.0    1.0    0.0    1.0   
64      Moran     R    KS    0.0    1.0    1.0    1.0    1.0    0.0    1.0   
30       Enzi     R    WY    0.0    1.0    1.0    1.0    1.0    0.0    1.0   

    00010  00020  00026  00032  00038  00039  00044  00047  extremism  
98    0.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  46.250476  
53    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  46.046873  
69    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  46.046873  
80    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  46.046873  
26    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  46.046873  
48    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  40.017540  
47    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  40.017540  
65    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  40.017540  
64    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  40.017540  
30    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  40.017540

In [ ]: